---
title: 'Ghana Wave II study: Spill-over analysis'
output:
  bookdown::html_document2:
            toc: yes
            fig_caption: yes
date: "`r Sys.Date()`"
---

```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(out.width="1000px", dpi=120)
knitr::opts_chunk$set(fig.align='center')
#
options(repos = list(CRAN="http://cran.rstudio.com/"))
options(scipen=999)
#
neededPackages <- c("dplyr","ggplot2", "data.table","magrittr", "kableExtra","wesanderson","ggpubr", "outliers", "gridExtra","cowplot","gtable","gt","pander","ggstatsplot","viridis","ggExtra","sjPlot","janitor","goft","patchwork","gridGraphics","e1071","moments","poweRlaw","BayesFactor","truncgof","EnvStats","shiny","fitdistrplus","eva","nortest","lmerTest", "lmtest","sandwich", "fixtest","miceadds", "estimatr", "tibble", "ggeffects","fixest", "marginaleffects")

newPackages <- neededPackages[!(neededPackages %in% installed.packages()[,"Package"])]
if(length(newPackages)) install.packages(newPackages)
lapply(neededPackages, require, character.only = TRUE)
rm(neededPackages,newPackages)
#
# library(remotes)
# install_github("lrberge/fixest")
# install_github("vincentarelbundock/marginaleffects")
#

```

```{r data}
#
df<- fread("blinded_data.csv")
colnames(df)<- tolower(colnames(df))
df$community<-as.factor(df$community)
df$tbhealth<- as.factor(df$tbhealth)
df$tbhealthplustext<- as.factor(df$tbhealthplustext)
df$tbhealthplus3<- as.factor(df$tbhealthplus3)
#
df_subset<- df %>%
            dplyr::select(responseid, community, identify_respondents, age, gender, ethnicity, work_situation, education,allocation, n_household,n_children,tbhealth,tbhealthplustext, tbhealthplus3, subj_attended, household_attended, population, community_attended, n_household, community_treatment)
#
### Create a variable which indicates the different level of treatments in the same variable:
#
df_subset<- df_subset %>%
            mutate(individual_treatment= ifelse(tbhealth==1,"tbhealth",
                                         ifelse(tbhealthplustext==1, "tbhealthplustext",
                                         ifelse(tbhealthplus3==1, "tbhealthplus3", NA))))
#
```

```{r subject_trt_effect}
#
mod_subject<-feglm(subj_attended~individual_treatment, cluster="community", dat=df_subset, family="binomial")

```

## Summary of findings

This report provides the results of the analyses conducted to assess the occurrence of Within household spill-over effect or Proximity spill-over effect on the pilot sample including 78 villages from 2 districts in Ghana for a total of 2,028 individuals sampled and randomised to three treatments: Health (video message), Health+Text, and Health+3USD.

The analyses did not show any evidence of a Within household or Proximity spill-over effects when comparing the attendance to a TB screening clinic of untreated members of the respondent's household, the untreated villagers, and the untreated villagers excluding the members of the respondent's household, following the administration of one of the three above-mentioned treatment to the respondents.

The significance has been evaluated at a level of 0.05, as per convention.

### Within household treatment effect spill-over

Following the study protocol, the enumerators sample only one individual within each sampled household, but they collect the information about the other members of the household. The variable n_household provides an indication on the number of members in a certain household, while the variable household_attended provides an indication on the number of household members who have attended the TB screening clinic.

The hypotheses we are aiming to assess in this section are:

-   H0: "There is No spill-over effect - the TB screening rates are similar across the three treatments arms"
-   H1(p): "The TB screening rate for the treatment arm Health message + 3.00USD is larger than the TB screening rate for the treatment arm only including Health message"
-   H1(s): "The TB screening rate for the treatment arm Health message + Voice reminder is larger than the TB screening rate for the treatment arm only including Health message"
-   H1(s): "The TB screening rate for the treatment arm Health message + 3.00USD is larger than the TB screening rate for the treatment arm with Health message + Voice reminder"

**Note:**p stands for primary and s for secondary.

We have created a variable called "individual_treatment" which has a value for all the subjects in the dataset corresponding to the treatment arm they have been randomized into.

In order to test our hypotheses, we have considered a count model (family= Poisson, link=logit), where as outcome we have reported household_attended and we have adjusted by individual_treatment.

To ensure robustness of our findings, we have conducted two sensitivity analyses including the household size in the estimation process in case n_household \> 0. In case of n_household=0, the outcome variable would be always 0 - since there would not be additional members of the household visiting the TB clinic.

From the questionnaire it was unclear if n_household included children. In 500 rows of the dataset n_children \> n_household, therefore it appears that the children have been counted separately.

Our opinion is that considering the results of the sensitivity analysis would be more robust than focusing on the output of the model run on the entire dataset. The reasoning behind it is that in the case of the 14 one-person households, household_attended = 0 won't have the same interpretation of household_attended=0 in household having more than one member.

```{r household_spillover, results='hide'}

### FULL COUNT MODEL 
#
pois_household<- feglm(household_attended ~ individual_treatment, data = df_subset, family="poisson", cluster = "community") 
#

## SENSITIVITY ANALYSIS 1: n_household as offset
#
df_household_filtered<- df_subset %>%
                        filter(n_household > 0)
#
pois_household_sa<- feglm(household_attended ~ individual_treatment+ offset(log(n_household)), data=df_household_filtered, family="poisson", cluster="community")
#

```

*Table 1* and *Table 2* below show that there is no evidence to believe there is a spill-over effect when comparing Health+Text and Health+3USD treatments vs Health within households considering the entire dataset and considering only the household with a membership \>= 2.

*Table 1 - Spill-over samples: rate ratios from regression of TB screening attendance on treatments (entire dataset)*
<center>
```{r output_household_spillover}

#
tab_model(pois_household, show.r2 = F,
          dv.labels = c("Household attendance"),
          pred.labels = c("Intercept", "Health + 3USD", "Health + text"))
#

```
</center>

<center>
*Table 2 - Spill-over samples: rate ratios from regression of TB screening attendance on treatments (not including one-person households)*

```{r output_household_spillover_sa}

#
tab_model(pois_household_sa, show.r2 = F,
          dv.labels = c("Household attendance"),
          pred.labels = c("Intercept", "Health + 3USD", "Health + text"))
#

```
</center>

In order to investigate further your alternative hypotheses, we have thought of offering an overview of the pairwise comparisons of the incidence rate ratios associated with the different treatments in *Table 3* It appears there is no statistically significant difference (at a significance threshold of 0.05) in the marginal incidence rate ratios when comparing Health+3USD and Health+Text, as well as Health+3USD and Health+Text to Health only.

**Note:** p-values and SE estimates are different from the main model because in the main model we were representing the slopes of the incidence in the number of additional household members attending the TB screening - the expected outcome associated to Health+3USD and Health+Text when compared to Health. In the contrasts we are looking at the difference in outcome between two treatment levels (which is expressed by a ratio since we have anti-logged our estimates) and since we are referring to predictions, the 95% intervals are technically prediction intervals more than confidence intervals.

```{r marginal_effects_household}
#
#
cmp_household<-avg_comparisons(pois_household_sa,
                           variables= list(individual_treatment="pairwise"), 
                           comparison="ratio", 
                           hypothesis=1) %>%
                           as.data.frame()
#

#
```

<center>
*Table 3 - Within spill-over sample: contrasts*

```{r output_contrasts, warning=FALSE, message=FALSE, include=TRUE}

cmp_household %>%
  dplyr::mutate(across(is.numeric, round, 4)) %>%
  dplyr::mutate(ci=paste0("[",conf.low,",",conf.high,"]")) %>%
  dplyr::select(-term, -s.value, -conf.low, -conf.high, -statistic) %>%
  rename("Contrast"=contrast, "Estimate"=estimate, "Std.error"=std.error,
         "P-value"=p.value, "95% CI"=ci) %>%
  kbl(caption="") %>%
  kable_styling(full_width = F)

```
</center>

### Proximate neighbour treatment effect spill-over

The population has been recorded for each village in the trial. In this case, the main interest is to assess the presence of any spill-over effect when considering the other villagers apart from the respondent who attended the TB screening clinic (community_untreated1) and the villagers apart from the respondent and the members of their family who have attended the TB screening clinic (community_untreated2). Community_untreated1 will be given by community_attended - tot(subj_attended); while community_untreated 2 will be given by community_attended-tot(subj_attended) - tot(household_attended).

The hypotheses we are aiming to assess in this section are:

-   H0: "There is No spill-over effect - the TB screening rates are similar across the three treatments arms"
-   H1(p): "The TB screening rate for the treatment arm Health message + 3.00USD is larger than the TB screening rate for the treatment arm only including Health message"
-   H1(s): "The TB screening rate for the treatment arm Health message + Voice reminder is larger than the TB screening rate for the treatment arm only including Health message"
-   H1(s): "The TB screening rate for the treatment arm Health message + 3.00USD is larger than the TB screening rate for the treatment arm with Health message + Voice reminder"

**Note:**p stands for primary and s for secondary.

In order to test our hypotheses, we have considered a count model (family= Poisson, link=logit), where as outcome we have reported community_untreated1 and community_untreated2, we have adjusted by community_treatment and added an offset to account for the different size of each village in terms of population.

*Table 4* and *Table 5* below show that there is no evidence to believe there is a proximity spill-over effect when comparing Health+Text and Health + 3USD treatments vs Health considering all the untreated villagers and only the untreated villagers who do not belong to the respondent's household.

```{r, include=FALSE, message=FALSE, warning=FALSE}

#
cu1<- df_subset %>%
      group_by(community) %>%
      summarise(tot_subj_attended=sum(subj_attended),
                community_untreated1=community_attended - tot_subj_attended) %>%
        dplyr::select(community, community_untreated1) %>%
        distinct()
#
cu2<- df_subset %>%
      group_by(community) %>%
      summarise(tot_subj_attended=sum(subj_attended),
                tot_household_attended= sum(household_attended), 
                cu=community_attended - tot_subj_attended,
                community_untreated2=cu-tot_household_attended) %>%
        dplyr::select(community, community_untreated2) %>%
        distinct()
#
df_community1<- inner_join(df_subset, cu1, by="community") %>%
                mutate( community_untreated1=as.numeric(community_untreated1),
                        community_treatment=as.factor(community_treatment))
#
df_community2<- inner_join(df_subset, cu2, by="community") %>%
                mutate( community_untreated2=as.numeric(community_untreated2),
                        community_treatment=as.factor(community_treatment))
#
pois_untreated1<- feglm(community_untreated1~community_treatment+offset(log(population)), data=df_community1, family=quasipoisson, cluster="community")#
#
pois_untreated2<- feglm(community_untreated2~community_treatment+offset(log(population)), data=df_community2, family=quasipoisson, cluster="community")#
#

## Assumption on dispersion checked for both models by running a glmer, slight violation of dispersion (pvalue = 0.016) - used quasipoisson family instead.
#
#
```

*Table 4 - Spill-over samples: rate ratios from regression of TB screening attendance for untreated villagers on community treatments (all untreated villagers)*
<center>
```{r output_untreated1_spillover}

tab_model(pois_untreated1,  show.r2 = F,
          dv.labels = c("Community untreated 1"),
          pred.labels = c("Intercept", "Health + 3USD", "Health + text"))
#

```
</center>

*Table 5 - Spill-over samples: rate ratios from regression of TB screening attendance for untreated villagers on community treatments (all untreated villagers except the members of the respondent's household)*

<center>
```{r output_untreated2_spillover}

tab_model(pois_untreated2, show.r2 = F,
          dv.labels = c("Community untreated 2"),
          pred.labels = c("Intercept", "Health + 3USD", "Health + text"))
#

```
</center>

In order to investigate further your alternative hypotheses, we have thought of offering an overview of the pairwise comparisons of the incidence rate ratios associated with the different treatments in *Table 6* and *Table 7*. It appears there is no statistically significant difference (at a significance threshold of 0.05) in the marginal incidence rate ratios when comparing Health + 3USD and Health + Text, as well as Health+3USD and Health+Text to Health only.

```{r marginal_effects_proximity}
#
#
cmp_proximity1<-avg_comparisons(pois_untreated1,
                                variables= list(community_treatment="pairwise"), 
                                comparison="ratio", 
                                hypothesis=1) %>%
                                as.data.frame()
#
#
cmp_proximity2<-avg_comparisons(pois_untreated2,
                                variables= list(community_treatment="pairwise"), 
                                comparison="ratio", 
                                hypothesis=1) %>%
                                as.data.frame()
#
```

*Table 6 - Proximity spill-over sample: contrasts (all untreated villagers)*

<center>
```{r output_contrasts_proximity1,message=FALSE, warning=FALSE}

cmp_proximity1 %>%
  dplyr::mutate(across(is.numeric, round, 4)) %>%
  dplyr::mutate(ci=paste0("[",conf.low,",",conf.high,"]")) %>%
  dplyr::select(-term, -s.value, -conf.low, -conf.high, -statistic) %>%
  rename("Contrast"=contrast, "Estimate"=estimate, "Std.error"=std.error,
         "P-value"=p.value, "95% CI"=ci) %>%
  kbl(caption="") %>%
  kable_styling(full_width = F)

```
</center>


*Table 7 - Proximity spill-over sample: contrasts (all untreated villagers except the members of the respondent's household)* 

<center>
```{r output_contrasts_proximity2,warning=FALSE, message=FALSE}

cmp_proximity2 %>%
  dplyr::mutate(across(is.numeric, round, 4)) %>%
  dplyr::mutate(ci=paste0("[",conf.low,",",conf.high,"]")) %>%
  dplyr::select(-term, -s.value, -conf.low, -conf.high, -statistic) %>%
  rename("Contrast"=contrast, "Estimate"=estimate, "Std.error"=std.error,
         "P-value"=p.value, "95% CI"=ci) %>%
  kbl(caption="") %>%
  kable_styling(full_width = F)

```
</center>